Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Application of improved point-wise mutual information in term extraction
DU Liping, LI Xiaoge, ZHOU Yuanzhe, SHAO Chunchang
Journal of Computer Applications    2015, 35 (4): 996-1000.   DOI: 10.11772/j.issn.1001-9081.2015.04.0996
Abstract776)      PDF (783KB)(717)       Save

The traditional Point-wise Mutual Information (PMI) method has shortcoming of overvaluing the co-occurrence of two low-frequency words. To get the proper value of k of improved PMI named PMIk to overcome the shortcoming of PMI, and solve the problem that the term extraction cannot be obtained from a segmented corpus with segmentation errors, as well as maintaining the portability of term extraction system, combining with the PMIk method and two fundamental rules, a new method was put forward to identity terms from an unsegmented corpus. Firstly, 2-gram extended seed was determined by computing the bonding strength of two adjoining words by PMIk method. Secondly, whether the 2-gram extended seed could be extended to 3-gram was determined by respectively computing the bonding strength between the seed and the word in front of it and the word located behind it, and then getting multi-gram term candidates iteratively. Finally, the garbage of term candidates were filtered using the two fundamental rules to obtain terms. The theoretical analysis shows that PMIkcan overcome the shortcoming of PMI when k≥3(k∈N+). The experiments on 1 GB SINA finance Blog corpus and 300 MB Baidu Tieba corpus verify the theoretical analysis, and PMIk outperforms PMI with good portability.

Reference | Related Articles | Metrics